Stackdriver Workspaces (Cloud Operations)

In this lesson, we will discuss provisioning Stackdriver workspaces.

This lesson is an introductory lesson for Stackdriver. We have some in-depth lessons for Stackdriver in upcoming sections.

Google is known for acquiring businesses and integrating their services with their services. Stackdriver is one of these. It started in 2012-13, and Google acquired Stackdirver in 2014.

After 2020, Stackdriver was renamed to Cloud Operations. As this is an introductory lesson, we will cover the uses of Stackdriver and all its offerings.

So, let’s explore Stackdriver.

Monitoring#

The first part of cloud operations is Monitoring to monitor resources from one or more projects. A workspace is a tool for monitoring resources in one or more Google Cloud projects or AWS accounts. Yes, Stackdirver can also monitor AWS accounts using an AWS connector. Let’s go through the sub menus of Monitoring.

To Open, go to Main menu > Operations > Monitoring

Workspaces#

The monitoring workspace is a one-stop solution to all monitoring requirements. A workspace, by default, monitors the host project (known as a scoping project), and the workspace’s name is automatically set to the host project.

There are two types of projects in the workspace.

  1. Host project/Scoping Project: This stores all the Stackdirver Workspace’s metadata.

  2. Monitored project: One or more workspaces can monitor a Google Cloud project or AWS account. A workspace constantly monitors its Google Cloud host project. However, you can configure a workspace to monitor up to one hundred Google Cloud projects and AWS accounts alongside the host project.

Workspace W Monitoring Projects A,B and AWS Account D
Workspace W Monitoring Projects A,B and AWS Account D

Even though every project can have its monitoring workspace, having a separate monitoring workspace project makes it easy to have the entire monitoring workload in one place.

Dashboard#

The dashboard is a combination of charts that shows metrics based on resource type and filters you choose. There are mainly four types of dashboards in GCP.

  1. Created by GCP: Google cloud creates default dashboards for the services you have made. The type of these dashboards is Google Cloud Platform. Any user cannot delete these dashboards.

  2. Custom: Dashboard created by a user. If default dashboards do not serve the purpose, we can create custom dashboards.

  3. Application: When third-party services are installed on GCP resources, this dashboard provides detailed information about that service.

  4. Amazon Web Services: This dashboard monitors products provided by Amazon.

You can create different dashboards to monitor other services. Currently, there will be one monitoring dashboard for Cloud Storage; this will show stats about the buckets we created earlier. You can monitor all the resource-specific attributes like requests, network traffic, number of objects, etc.

For example, you want to monitor the CPU utilization of all the compute instances using a dashboard.

Then, create a dashboard with a specific name denoting its purpose. Click “add charts,” then select the resource types and other filters based on your requirements.

Created with Fabric.js 3.6.6

1 of 5

Created with Fabric.js 3.6.6
Rename the dashboard. You can either use ADD CHART button or drag the charts from library.

2 of 5

Created with Fabric.js 3.6.6
The left side pane has different metrics that you can use to monitor filtered data. Keep defaults for now.

3 of 5

Created with Fabric.js 3.6.6
The dashboard will be created as soon as you add the chart. Since there are no VMs, you won't see any chart.

4 of 5

Created with Fabric.js 3.6.6
This is how the charts will look like when there is a enough data.

5 of 5

Services#

This is more of the architect-level section, but services monitor a particular service for “Service Level Objective.” SLO is the business term that means how much time we want to provide uninterrupted service for our customers. For example, we have committed a 99.5% uptime for a particular software. Then, out of 720 hours in a month, our service should be up for a minimum of 716.4 hours. The only requirement is that the user-defined service is deployed using the Google Kubernetes Engine. Services help us to achieve business-level SLOs.

Metrics explorer#

Metrics Explorer is a playground for analyzing and creating charts for different metrics. You can select any metrics related to the resource type and add them to any dashboard as a chart.

Created with Fabric.js 3.6.6
Select the service/resource to monitor. Ex. VM, GCS, GKE etc.

1 of 5

Created with Fabric.js 3.6.6
Select the metric related to that service for analysis. For demo, select GCS object count.

2 of 5

Created with Fabric.js 3.6.6
You can add more metrics or add the current visualization to a dashboard using SAVE CHART button.

3 of 5

Created with Fabric.js 3.6.6
For now add it to the existing dashboard. Don't worry about the dashboard name. You can always rename the dashboard.

4 of 5

Created with Fabric.js 3.6.6
Go back to the dashboard menu and confirm the addition of the chart.

5 of 5

Alerting#

As the name says, this sends an alert to the configured notification channels based on a policy.

A policy tracks a set of conditions that can be defined using either an uptime check or some metrics. As an industry standard, it is a good practice to document the possible steps to fix the alert/issue.

When that condition is met, the Alert service will send messages to all configured notification channels alerting the user about the event or condition being triggered.

  • Click on “Alerting.” You will get a form to select a condition for triggering the alert.

  • You can create alerts based on specific metrics, uptime checks, or the status of a process.

  • Alerting service has all the possible notification channels. Let’s configure an email notification channel. We will use this configured email in the uptime check section.

Created with Fabric.js 3.6.6
Click on the EDIT NOTIFICATION CHANNELS button.

1 of 2

Created with Fabric.js 3.6.6
Scroll down to Email section and add the email address to get the notifications.

2 of 2

Uptime checks#

Uptime checks monitor a particular service, app, or resource. When the said service goes down, an incident is created, and based on that incident; the alert is sent to the configured notifications channels.

Uptime checks are also used to create alerting policies. In the KPI measurements of service, uptime checks play an important role.

Uptime checks can be configured using three protocols. Those are:

  1. HTTP: Any HTTP endpoint hosted on public IP address and reachable via a standard HTTP request can be tracked for its uptime.

  2. HTTPS: Any HTTPS websites hosted publicly or reachable via any browser can be tracked. You can follow the uptime of https://google.com as well. You can always track your website uptime using this one.

  3. TCP: Any other TCP-based application with a reachable port. These applications are tracked by periodically establishing the connection using the port. If there is a failure in establishing the link, the alert is triggered.

Uptime checks also suggest any possible alerts that need to be configured. Uptime check supports Email, SMS, Slack (Beta), Pagerduty (Beta), Cloud PubSub (Beta), and Webhooks (Beta) notification channels. You configure any of them.

Whenever we create an uptime check policy, the Alerting service automatically creates one failure policy. The failure policy’s success depends on the failure of the uptime check policy. If the uptime check fails, that is a success for failure policy, and then alerting service will send the notification to the configured notification channel.

As we have not created any service for ourselves, let’s track the uptime of https://google.com for demo purposes.

Created with Fabric.js 3.6.6
Click on the CREATE UPTIME CHECK button.

1 of 8

Created with Fabric.js 3.6.6
Provide a name for the check. Select HTTPS and provide the Google URL. Click on MORE TARGET OPTIONS.

2 of 8

Created with Fabric.js 3.6.6
Make sure that validate SSL checkbox is checked. Also there should not be any username and password in the authentication fields.

3 of 8

Created with Fabric.js 3.6.6
Keep defaults for Response Validation section.

4 of 8

Created with Fabric.js 3.6.6
In the alert & notification section, select the earlier configured email notification and click on the create button.

5 of 8

Created with Fabric.js 3.6.6
After 1 min, you will see something like this for the created uptime check.

6 of 8

Created with Fabric.js 3.6.6
Reference image in case the policy fails for some region.

7 of 8

Created with Fabric.js 3.6.6
Click on the created uptime check after sometime to see the statistics. You will also see the automatically created alert policy .

8 of 8

Groups#

Within a workspace, you can use Groups if you have to group only particular resources and monitor them separately.

For example, a service named “Image uploader” uses a cloud function, a bucket, and a PubSub topic; you can group and monitor all these resources simultaneously.

Groups also provide different filters to choose specific resources to meet the requirement.

Settings#

It is used to carry out admin tasks such as configuring AWS accounts, adding other projects to the workspace, removing or moving projects from the current workspace, or merging other workspaces. Also, documentation on how to set up agents for logging and monitoring different resources. It is an admin panel for Monitoring.

  • Since a workspace can monitor more than one GCP or AWS project, you can add a project to the existing workspace by clicking the ADD GCP PROJECTS or ADD AWS ACCOUNT buttons.

  • Since one project can be part of multiple workspaces, we can merge other workspaces into the current one. But remember, connecting different workspaces to the current workspace will delete all configs. To join the additional workspace, click on the MERGE button.

Workspaces settings.
Workspaces settings.

Overview#

Finally, we have the overview tab. The overview tab provides a look at all the tabs at a glance. This monitoring dashboard displays all the policies, uptime checks, alerts, and groups. As the name suggests, this is an overview of all other tabs.

The monitoring part of cloud operations is very useful when there is a production workload. Different services are used to monitor production applications. Monitoring is a GCP-specific service to get the stats about your GCP workloads automatically. Of all of the above services, the alerting service is the most important service because it alerts users in case of any unplanned downtime on the application or application failure due to any issues. This has been a brief overview of the Cloud Operations’ monitoring service.

Enabling APIs

Quiz